Breast Cancer Research
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Breast Cancer Research's content profile, based on 32 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Howard, F. M.; Li, A.; Kochanny, S.; Sullivan, M.; Flores, E. M.; Dolezal, J.; Khramtsova, G.; Hassan, S.; Medenwald, R.; Saha, P.; Fan, C.; McCart, L.; Watson, M.; Teras, L. R.; Bodelon, C.; Patel, A. V.; Symmans, W. F.; Partridge, A.; Carey, L.; Olopade, O. I.; Stover, D.; Perou, C.; Yao, K.; Pearson, A. T.; Huo, D.
Show abstract
Purpose: To test whether histology-derived gene-expression signatures from routine hematoxylin and eosin slides are prognostic for recurrence and predictive of chemotherapy benefit in early breast cancer. Methods: We conducted a multi-cohort study including CALGB 9344 (anthracycline +/- paclitaxel), CALGB 9741 (standard vs dose-dense chemotherapy), a pooled Chicago real-world cohort, and the American Cancer Society (ACS) Cancer Prevention Studies-II and -3. Whole-slide images were processed with a previously described pipeline to generate 61 histology-derived signatures per patient. The primary endpoint was distant recurrence-free interval (DRFI), except in ACS, where breast cancer-specific survival was used. Secondary endpoints include distant recurrence-free survival (DRFS) and overall survival. The most prognostic signature in CALGB 9344, selected by Harrell's C-index, was evaluated in additional cohorts. Signature-treatment interaction was assessed by likelihood-ratio tests. Multivariable Cox models incorporating age, tumor size, nodal status, estrogen/progesterone receptor status, and signature were fit in CALGB 9344 to improve risk stratification. Results: A total of 7,170 patients were included across four cohorts. The top histology-derived signature in CALGB 9344 showed strong prognostic performance for 5-year DRFI (C-index 0.63) and performed well across validation cohorts (C-index 0.60, 0.70, and 0.62 in CALGB 9741, Chicago, and ACS, respectively). The strongest predictive signal for treatment benefit was observed for DRFS. High-risk cases identified by the signature demonstrated greater benefit from taxane in CALGB 9344 (adjusted hazard ratio [aHR] 0.76 for DRFS, 95% CI 0.66-0.88; interaction p=0.028), from dose-dense chemotherapy in CALGB 9741 (aHR 0.69, 95% CI 0.56-0.85; interaction p=0.039), and differential chemotherapy benefit in the Chicago cohort (aHR 0.84, 95% CI 0.59-1.21; interaction p=0.009). Combined clinical-histology models improved risk stratification and identified low-risk groups with a 2%-10% risk of distant recurrence or breast cancer death. Conclusion: Histology-derived signatures from H&E images are broadly prognostic and, unlike clinical factors, may predict chemotherapy benefit.
Huang, T.; Koch, F. C.; Peake, D. A.; Adam, K.-P.; David, M.; Li, D.; Heffernan, K.; Lim, A.; Hurrell, J. G.; Preston, S.; Baterseh, A.; Vafaee, F.
Show abstract
Early detection of breast cancer remains essential for improving clinical outcomes, and complementary non-invasive approaches are needed to support existing screening methods, particularly for women with dense breast tissue. We have previously reported plasma lipid biomarker discovery using untargeted high-resolution liquid chromatography tandem mass spectrometry (LC-MS/MS). In this study, we performed biomarker confirmation and developed machine-learning models applied to targeted plasma lipid measurements for the non-invasive detection of early-stage breast cancer across international cohorts with independent external validation. Targeted LC-MS/MS was used to quantify candidate lipid panels in plasma samples from European discovery cohorts (n = 554) and an independent Australian cohort (n = 266) used for external validation. Data-driven feature selection identified a 15-lipid panel with strong performance in European cohorts (AUC >= 0.94). External validation prior to confidence stratification yielded 76% sensitivity, 64% specificity, and an AUC of 0.81 in the Australian validation cohort. Clinical assay development requires iterative panel and model testing to support translational feasibility and performance in the intended-use population. An analytically viable panel, excluding lipids requiring complex and costly synthesis, achieved comparable accuracy with improved assay robustness. Confidence-based analysis showed enhanced performance for predictions made with moderate to high confidence, with sensitivity up to 89% and AUC up to 0.85, suggesting that ongoing research should focus on strategies to enhance diagnostic model confidence. Importantly, model predictions were independent of breast density, tumour size, grade, subtype, and morphology, indicating biological specificity of the lipid signature. These results demonstrate that calibrated machine-learning models applied to plasma lipid biomarkers can support non-invasive breast cancer detection. Expanding training datasets to include greater diversity will further improve performance in the ongoing development of this lipid-based detection approach.
Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.
Show abstract
Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.
Murugadoss, K.; Venkatakrishnan, A. J.; Soundararajan, V.
Show abstract
Metabolic dysfunction is increasingly recognized as a risk factor for poor outcomes in breast cancer, but whether incretin-based therapies confer survival benefit beyond weight loss remains unresolved. Using a federated electronic health record platform spanning nearly 29 million patients, we evaluated breast cancer survival after semaglutide and tirzepatide initiation in routine care. In 1:1 propensity-matched pooled-comparator analyses, semaglutide was associated with improved overall survival versus metformin, sodium-glucose cotransporter 2 (SGLT2) inhibitor, and dipeptidyl peptidase 4 (DPP4) inhibitor users, with 54 deaths among 2,433 semaglutide users (2.2%) versus 395 deaths among 2,433 comparators (16.2%) over 24 months (log-rank P < 0.001). Tirzepatide showed a favorable survival association relative to pooled anti-diabetic comparators that did not meet statistical significance (P = 0.24), with 3 deaths among 220 users (1.4%) versus 64 deaths among 220 comparators (29.1%). In a head-to-head propensity-score-matched comparison, overall survival did not differ significantly between semaglutide and tirzepatide treated patients with pre-existing breast cancer (2,117 per arm; P = 0.12). In semaglutide-treated patients alive and observable at the 1-year landmark, higher maximum dose achieved was significantly associated with lower post-landmark mortality (P = 0.034), with an event rate of approximately 1.0% in the high-dose group (>=1.7 mg) versus approximately 4.5% in the low-dose group (0.25-1.0 mg). Despite a linear dose weight loss relationship for semaglutide, however, weight loss strata did not separate survival outcomes (global P = 0.22). In tirzepatide-treated patients alive and observable at the same landmark, neither maximum dose achieved nor weight loss strata separated post-landmark survival (P = 0.98 and P = 0.50, respectively). Structured EHR and AI-based clinical note analyses further showed significantly lower frequency of documented metastatic disease in semaglutide-treated patients relative to pooled anti-diabetic comparators, including any metastasis (7.0% versus 15.0%, rate ratio 0.5, P < 0.001), bone metastasis (1.0% versus 5.2%, rate ratio 0.2, P < 0.001), and liver, lung, or brain metastases (all P < 0.001). LLM-derived cause-of-death extraction further showed a 60% lower relative proportion of cancer-associated deaths in semaglutide-treated patients (19% of ascertainable deaths) than in matched pooled anti-diabetic comparators (47% of ascertainable deaths), with comparator deaths more often attributed to cancer progression involving metastatic breast cancer, leptomeningeal carcinomatosis, and cancer-driven organ failure. Overall, this study demonstrates that semaglutide use in patients with pre-existing breast cancer is associated with a dose correlated but weight loss independent improvement in overall survival. These findings motivate prospective trials of GLP-1 receptor agonists in breast cancer across various stages and treatment settings.
Anctil, N.; Hauguel, P.; Noel, L.-P.
Show abstract
Background. Breast cancer (BC) remains the most diagnosed malignancy and leading cancer-related cause of mortality in women worldwide. Although blood-based untargeted metabolomics has emerged as a promising modality for detecting early-stage BC, the clinical translation of this approach has been bottlenecked by two unresolved issues: (i) the field has almost exclusively relied on serum or plasma, which require venipuncture and cold-chain logistics, and (ii) machine-learning models reported on such data are frequently validated with protocols that are blind to analytical batch structure, producing optimistically biased performance estimates. Methods. We present a breast cancer detection study based on dried blood spots (DBS), an analytical matrix that enables self-collection and ambient-temperature shipping. A cohort of 2,734 participants (114 biopsy-confirmed BC cases; 2,620 non-cancer controls) was profiled by untargeted LC-MS/MS on a Thermo Scientific Orbitrap IQ-X coupled to a Vanquish UHPLC. A 39-metabolite panel meeting MSI Level 1 identification criteria was pre-specified a priori from the published breast-cancer metabolomics literature, frozen prior to LC-MS acquisition, and applied to the present cohort without any feature selection on the data. Six standard supervised-learning architectures (LASSO, Elastic Net, Linear SVM, PLS-DA, OPLS-DA, XGBoost) were evaluated on this pre-specified panel; OPLS-DA is reported only in the sex-matched subgroup analysis where a single-seed 5-fold stratified protocol permits a directly comparable fit. Per-batch control-median normalization is applied upstream; kNN imputation, log transform, and robust scaling are fit within each training fold. The evaluation battery comprises batch-aware StratifiedGroupKFold CV at single-seed (seed=42) with inter-seed SD quantified across 10 independent seeds, batch-aware nested CV, a 100-seed held-out 20%-batch validation with disjoint-batch isotonic probability calibration (30% calibration partition), PPV/NPV reporting at multiple operating points and three deployment prevalences, subgroup analyses by TNM stage and tumor grade, pathway-ablation sensitivity analysis, and a 1,000-iteration permutation test. Results. Under batch-aware evaluation (StratifiedGroupKFold, single-seed=42), AUC ranged from 0.914 to 0.949 across classifiers, with LASSO achieving 0.928 and XGBoost 0.949; inter-seed SD across 10 seeds was 0.002-0.006. At 95% specificity, LASSO reached 75.4% sensitivity and XGBoost 81.6%. Held-out batch validation (100 seeds) yielded mean AUC 0.912 for Elastic Net and 0.935 for XGBoost, confirming robust generalization. All 39 panel features showed high coefficient stability, and permutation testing on representative classifiers (LASSO, Linear SVM, PLS-DA) yielded p <= 0.001. Subgroup analyses showed weaker detection of stage IIA tumors (AUC 0.87, n=40) compared with stage IIB/IIIA (AUC 0.95), consistent with stronger metabolic signatures in more advanced disease. Bootstrap coefficient consistency of the Elastic Net classifier confirmed that all 39 panel features received a non-zero multivariate weight in >=80% of 100 stratified bootstraps. Conclusions. On this cohort of diagnosed, pre-treatment breast-cancer cases, DBS LC-MS metabolomic profiling delivers classification performance (AUC 0.928 for LASSO and 0.949 for XGBoost under batch-aware GroupKFold CV at single-seed=42; held-out AUC 0.912-0.935) that is robust across classifier families and biological pathways. The DBS matrix is non-radiating, self-collectable by finger-prick, and mailable at ambient temperature. Performance is weaker on stage IIA than on more advanced disease, and prospective validation in an independent asymptomatic screening cohort is required before clinical positioning as a decentralized triage modality.
Han, S.; Xiang, D.; Chen, X.; Zhao, D.; Qin, G.; Bronson, R.; Li, Z.
Show abstract
AbstractRecurrent loss-of-function mutations in RUNX1 occur in estrogen receptor-positive (ER+) breast cancers, yet how RUNX1-loss contributes to breast tumorigenesis remains unclear. Here we used genetically engineered mouse models with luminal mammary epithelial cell (MEC)-restricted gene disruption to investigate its role in breast cancer initiation. Loss of RUNX1 alone, or together with RB1, was insufficient to drive tumor formation. In contrast, combined loss of RUNX1 and p53 induced mammary tumors with full penetrance. These tumors contained ER+ cancer cells and exhibited extensive T cell and macrophage infiltration, indicative of an immune hot microenvironment. Mechanistically, RUNX1-deficiency activated interferon signaling in luminal MECs, associated with derepression of RUNX1 target STAT1 and enhanced inflammatory responses. Consistent with these findings, human ER+ breast cancers with low RUNX1 expression displayed elevated immune signatures and poorer patient survival. Together, our results identify RUNX1-loss as a driver of an immune-active subtype of ER+ breast cancer.
Trummer, N.; Weyrich, M.; Ryan, P.; Furth, P. A.; Hoffmann, M.; List, M.
Show abstract
Anti-hormonal therapies such as selective estrogen receptor modulators like tamoxifen or aromatase inhibitors like letrozole represent a cornerstone for breast cancer prevention and therapy of estrogen receptor-positive breast cancer. Therapeutic monitoring can include blood tests and imaging; however, genetically-based approaches are not yet in practice. Ideally, a test would be able to detect a positive molecular response across different estrogen pathway-suppressive approaches. Circular RNAs are a species of non-coding RNAs detectable in plasma that have been proposed as non-invasive therapeutic biomarkers. To determine whether a set of specific circular RNAs is altered across estrogen-suppressive pathway approaches, we analyzed mammary gland-specific total RNA sequencing data from two individual genetically engineered mouse models (GEMMs) of estrogen pathway-induced breast cancer, with or without exposure to tamoxifen or letrozole. The nf-core/circrna pipeline was used to identify circRNAs that were differentially expressed in response to either tamoxifen or letrozole. We then screened for circRNAs that were differentially regulated by both anti-hormonals. Four up-regulated and 31 down-regulated circRNAs with host genes known to be expressed in human breast epithelial cells were identified as showing reproducible differential regulation in response to anti-hormonal treatment.
Cody, M. E.; Chang, H.-C.; Foldi, J.; Jankowitz, R. C.; Balic, M.; Cushing, T.; Donnelly, C.; Freeney, S.; Levine, J.; Petitti, L.; Ryan, N.; Spencer, K.; Turner, C.; Tseng, G. C.; Desmedt, C.; Oesterreich, S.; Lee, A. V.
Show abstract
BackgroundInvasive lobular breast cancer (ILC) is the most commonly diagnosed special histological subtype of breast cancer (BC). Metastatic ILC (mILC) is less sensitive to FDG-PET imaging and often metastasizes to unusual sites --peritoneum, gastrointestinal (GI) tract, ovaries, urinary tract, and orbit--which may go unrecognized after a long disease-free interval. Some metastatic sites cause nonspecific symptoms, like abdominal/epigastric pain, with numerous published case reports of mILC misdiagnosed as gastric cancer. These atypical BC metastatic sites may lead to late and/or misdiagnosis, thereby delaying effective treatments. ObjectiveWe developed a patient survey to investigate the patient-reported prevalence of delayed diagnosis or misdiagnosis of mILC and their potential impact upon treatment outcomes. MethodsA 45-question survey was developed and piloted with breast cancer researchers, clinical oncologists, and patient advocates. This IRB-approved survey was then distributed to patients with ILC. Analyses including data QC and visualization were conducted in R using descriptive statistics. Incomplete or inconsistent responses were excluded, and summary statistics were stratified by four common mILC sites to highlight subgroup differences. Results525 patient surveys were completed, with 450 patients diagnosed with ILC, and of those 321 diagnosed with mILC. For those with mILC, 33.3% (n=107) were diagnosed with de novo mILC at initial presentation. Of the patients diagnosed with mILC, 32.1% (n=103) presented with other medical conditions at diagnosis. Misdiagnosis was reported by 26.2% (n=84) of patients with mILC, and of these cases, 31% (n=26) had [≥]2 misdiagnoses. The top 5 misdiagnoses were bone-related condition (24.7%), benign breast condition (23.4%), another type of BC (7.8%), diagnostic delay (7.8%), and menopause related (5.2%). 44.5% of patients waited [≥]1 year for an accurate diagnosis. 49 patients were treated for their misdiagnosis, and 6 received incorrect cancer treatments. The most frequently reported contributors to delayed or misdiagnosis were inconclusive imaging, providers lack of ILC knowledge, and initial misdiagnosis. Of the 321 patients with mILC, 138 (42.9%) reported symptoms before diagnosis; the most common were back pain (16.5%), fatigue/malaise (14.9%), GI symptoms (11.8%), bloating (8.4%), and weight loss (8.1%). Although 40% of patients reported having a mammogram at the time of their initial misdiagnosis, ILC was detected in only 20.5% (24/116) of these cases, and mammography detected only 5 (25%) of the 20 de novo mILC cases. Patients reported additional diagnostic testing within 1-3 months of their initial mammogram, includingbiopsy, ultrasound (US), and MRI. 47.9% of patients were in active BC surveillance after curative intent therapy at the time of their mILC diagnosis; however, no statistical difference was seen in time to diagnosis versus those patients not under surveillance. ConclusionOur survey results underscore the urgent need to improve diagnostic strategies for mILC. Addressing delays and diagnostic errors in mILC is critical to optimizing treatment strategies and improving patient outcomes.
Aggarwal, D.; Russo, S.; Anderson, K.; Floyd, T.; Utama, R.; Rouse, J. A.; Naik, P.; Pawlak, S.; Iyer, S. V.; Kramer, M.; Satpathy, S.; Wilkinson, J. E.; Gao, Q.; Bhatia, S.; Arun, G.; Akerman, M.; McCombie, W. R.; Revenko, A.; Kostroff, K.; Spector, D. L.
Show abstract
BackgroundLong non-coding RNAs (lncRNAs) have emerged as key regulators of tumor biology, however, thus far none have translated to cancer therapies. The lncRNA MALAT1 is overexpressed in more than 20 cancers, including breast cancer and has been shown to function via various mechanisms in a context-dependent manner, in 2D cell lines and mouse models. However, its functional role and therapeutic potential have not been evaluated in clinically relevant patient-derived models. MethodsWe investigated the therapeutic potential of a MALAT1-targeting antisense oligonucleotide (ASO) for breast cancer, using clinically relevant 3D human patient-derived organoids (PDOs) and PDO-xenograft (PDO-X) models. We systematically evaluated the efficiency of MALAT1-targeting ASOs using a biobank of 28 PDO models. Using three independent PDO-X models of triple negative breast cancer (TNBC), we targeted MALAT1 in vivo to study its impact on transcription, alternative splicing, stromal remodeling and metastasis. ResultsAcross PDO-X models, MALAT1 depletion reproducibly drove widespread alternative splicing changes across all event types, particularly intron retention events, accompanied by modest gene expression alterations. Differentially spliced transcripts were enriched for targets of shared cancer-associated transcription factors, and MALAT1 knockdown altered the relative abundance of previously unannotated splicing isoforms. Beyond tumor-intrinsic effects, tumor-specific MALAT1 depletion induced a consistent reduction in macrophage-associated gene signatures and reduced lung metastatic burden. ConclusionsOur data define MALAT1s multifaceted role in TNBC, coordinating alternative splicing, transcriptional fine-tuning, tumor-stroma crosstalk, and metastatic progression. Our study provides strong preclinical evidence supporting MALAT1-targeted ASO therapy and establishes PDO-X models as a clinically relevant platform for functional interrogation of TNBC therapies.
O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.
Show abstract
Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.
Ibanez-Rios, M.-I.; Aalam, S. M. M.; Ritting, M. L.; Jore, A.; Chaludiya, K.; Emperumal, C. P.; Jakub, J. W.; McLaughlin, S. A.; Degnim, A. C.; Couch, F.; Boughey, J. C.; Yadav, S.; Sadanandam, A.; Sherman, M. E.; Radisky, D.; Knapp, D. J. H. F.; Kannan, N.
Show abstract
The normal adult male breast has not been characterized at single-cell resolution, leaving the cellular basis of male breast cancer (MBC) biology undefined. Here we present an integrated single-cell RNA sequencing atlas of the adult human breast comprising 174,471 cells from 17 donors (3 male, 14 female), including 18,117 male-derived cells. This revealed that the male breast retains all three epithelial populations, basal (BC), luminal progenitor (LP), and luminal committed cells (LC), but with an increase in LC at the expense of BC and LP across all three male donors. Male LC were distinguished from female by elevated ESR1 and PGR mRNA, enrichment of RNA processing and ribosome biogenesis programs, reduced inflammatory cytokine and growth factor signaling, elevated estradiol gene set enrichment scores, and higher inferred activity of developmental patterning transcription factors. This pattern was observed across differential expression, gene ontology, ligand profiling, and regulon-based analyses, and was not restricted to sex chromosome-linked gene expression. This is consistent with the near-universal estrogen receptor (ER) positivity that characterizes MBC clinically. This atlas provides the first cellular and transcriptional reference for the normal male breast and a resource for investigating sex differences in mammary biology, germline susceptibility variant interpretation, and modeling breast malignancies.
Muroyama, Y.; Yanagaki, M.; Tada, H.; Ebata, A.; Ito, T.; Ono, K.; Tominaga, J.; Miyashita, M.; Suzuki, T.
Show abstract
Secretory breast carcinoma (SBC) is typically indolent, yet mechanisms underlying aggressiveness and therapeutic resistance to tropomyosin receptor kinase inhibitors (TRKi) remain unclear. Autopsy-based longitudinal multi-organ high-dimensional profiling of metastatic TRKi-resistant SBC demonstrated histopathological heterogeneity, including secretory and squamous components, arising from a shared clonal origin. Integrated genomic and transcriptomic analyses revealed hierarchical transcriptional rewiring consistent with a lineage-plastic state, suggesting a potential link to tumor aggressiveness and therapeutic resistance.
Yao, S.; Zimbalist, A.; Sheng, H.; Fiorica, P.; Cheng, R.; Medicino, L.; Omilian, A.; Zhu, Q.; Roh, J.; Laurent, C.; Lee, V.; Ergas, I.; Iribarren, C.; Rana, J.; Nguyen-Huynh, M.; Rillamas-Sun, E.; Hershman, D.; Ambrosone, C.; Kushi, L.; Greenlee, H.; Kwan, M.
Show abstract
Background: Few studies have examined racioethnic disparities in cardiovascular disease (CVD) in women after breast cancer treatment, who are at higher risk due to cardiotoxic cancer treatment. Methods: Based on the Pathways Heart Study of women with a history of breast cancer, this analysis examines the association between cardiometabolic risk factors (hypertension, diabetes, and dyslipidemia) and CVD events with self-reported race and ethnicity, as well as genetic similarity. Multivariable logistic and Cox proportional hazards regression models were used to test race and ethnicity and genetic similarity with prevalent and incident cardiometabolic risk factors and CVD events. Results: Of the 4,071 patients in this analysis, non-Hispanic Black (NHB), Asian, and Hispanic women were more likely to have prevalent and incident diabetes than non-Hispanic White (NHW) women. Analysis of genetic similarity revealed results consistent with self-reported race and ethnicity. For CVD risk, NHB women were more likely to develop heart failure and cardiomyopathy than NHW women. In contrast, Hispanic women were at lower risk of any incident CVD, serious CVD, arrhythmia, heart failure or cardiomyopathy, and ischemic heart disease, which was consistent with the associations found with Native American ancestry. Conclusions: This is the largest multi-ethnic study of disparities in CVD health in breast cancer survivors, demonstrating corroborating findings between self-reported race and ethnicity and genetic similarity. The results highlight disparities in cardiometabolic risk factors and CVD among breast cancer survivors that warrant more research and clinical attention in these distinct, high-risk populations.
Ingawale, V.; Dandapat, K.; Konkada Manattayil, J.; Gupta, S.; Shashidhara, L. S.; Koppiker, C.; Shah, N.; Raghunathan, V.; Kulkarni, M.
Show abstract
Collagen organisation within the tumour microenvironment plays a critical role in tumour progression and has emerged as an important structural biomarker in cancer. Second Harmonic Generation (SHG) microscopy enables label-free visualisation and quantitative assessment of fibrillar collagen architecture; however, its high cost, specialised instrumentation, and limited field-of-view restrict routine clinical application. In this study, we evaluated whether collagen features quantified from digitally scanned Masson-Goldners Trichrome-stained histopathological sections can approximate measurements obtained from SHG microscopy. Formalin-fixed paraffin-embedded breast tumour tissues, including benign and invasive ductal carcinoma (IDC) samples with varying collagen content, were analysed using SHG microscopy and whole-slide brightfield imaging. Matched regions of interest were analysed using two independent digital image analysis approaches: a conventional ImageJ-based workflow (TWOMBLI) and a machine learning-based computational pipeline. Collagen structural parameters including collagen deposition area, fibre number, and alignment metrics were quantified and compared across imaging modalities using correlation analysis. SHG signals were consistently detected from trichrome-stained sections, confirming compatibility of SHG imaging. Quantitative comparison demonstrated significant concordance between SHG-derived collagen metrics and those obtained from digital image analysis pipelines, particularly for collagen area and fibre alignment. These findings demonstrate that computational analysis of routine histopathological images can capture key spatial features of collagen organisation comparable to SHG microscopy. Digital pathology-based collagen quantification therefore, represents a scalable and clinically accessible approach for assessing extracellular matrix architecture in tumour tissues.
Heine, J.; Fowler, E.; Egan, K.; Weinfurtner, R. J.; Balagurunathan, Y.; Schabath, M. B.
Show abstract
A substantial body of evidence demonstrates that measures from mammograms are predictive of breast cancer risk. In this matched case-control study, mammograms acquired near the time of diagnosis were analyzed to investigate bilateral breast asymmetry as measure of short-term risk prediction. Specifically, contralateral breast images were compared with measures derived in the Fourier domain (FD); this technique summarizes power in concentric radial bands that cover the Fourier plane. Equivalently, this approach can be described as a multiscale characterization of the image. The summarized power difference between respective contralateral bands produces an asymmetry measure. Full field digital mammography (FFDM) and synthetic two-dimensional images from digital breast tomosynthesis (DBT) were investigated for women that had both types of mammograms acquired at the same time. Odds ratios (ORs) and the area under the receiver operating curves (Azs) were generated from conditional logistic regression modeling with 95% confidence intervals. Raw unprocessed FFDM images produced significant findings: OR = 1.90 (1.58, 2.29) and Az = 1.72 (0.67, 0.76) per one standard deviation unit. Associations were significant but attenuated for both clinical FFDM and DBT images: OR = 1.31 (1.11, 1.54) and Az = 0.63 (0.58, 0.67); and OR = 1.48 (1.25, 1.76) and Az = 0.65 (0.60, 0.70), respectively. Results suggest that clinical FFDM and DBT images are inferior to raw FFDM images in capturing breast asymmetry with information loss for breast cancer risk prediction. Moreover, these DBT images have lower spatial resolution but produced stronger associations than the clinical FFDM images.
Chandra, S.
Show abstract
Background: Current deep learning models in computational pathology, radiology, and digital pathology produce opaque predictions that lack the explainable artificial intelligence (xAI) capabilities required for clinical adoption. Despite achieving radiologist-level performance in tasks from whole-slide image (WSI) classification to mammographic screening, these models function as black boxes: clinicians cannot trace predictions to specific biological features, verify outputs against established morphological criteria, or integrate AI reasoning into precision oncology workflows and tumor board decision-making. Methods: We present Virtual Spectral Decomposition (VSD), a modality-agnostic, interpretable-by-design framework that decomposes medical images into six biologically interpretable tissue composition channels using sigmoid threshold functions - the same mathematical structure as CT windowing. Unlike post-hoc xAI methods (Grad-CAM, SHAP, LIME) applied to black-box deep learning models, VSD channels have pre-defined biological meanings derived from tissue physics, providing inherent explainability without sacrificing quantitative rigor. For whole-slide image (WSI) analysis in digital pathology, we introduce the dendritic tile selection algorithm, a biologically-inspired hierarchical architecture achieving 70-80% computational reduction while preferentially sampling the tumor immune microenvironment. VSD is validated across three cancer types and imaging modalities: pancreatic ductal adenocarcinoma (PDAC) on CT imaging, lung adenocarcinoma (LUAD) on H&E-stained pathology slides using TCGA data, and breast cancer on screening mammography. Composition entropy of the six-channel vector is computed as a visual Biological Entropy Index (vBEI) - an imaging biomarker quantifying the diversity of active biological defense systems. Results: In pancreatic cancer, the fat-to-stroma ratio (a novel CT-derived radiomics biomarker) declines from >5.0 (normal) to <0.5 (advanced PDAC), enabling early detection of desmoplastic invasion before mass formation on standard imaging. In lung cancer, composition entropy from H&E whole-slide images correlates with tumor immune microenvironment markers from RNA-seq (CD3: rho=+0.57, p=0.009; CD8: rho=+0.54, p=0.015; PD-1: rho=+0.54, p=0.013) and predicts overall survival (low entropy immune-desert phenotype: 71% mortality vs 29%, p=0.032; n=20 TCGA-LUAD), providing immune phenotyping for checkpoint immunotherapy patient selection from a $5 H&E slide without molecular assays. In breast cancer, each lesion type produces a characteristic six-channel fingerprint functioning as an interpretable computer-aided diagnosis (CAD) system for quantitative BI-RADS assessment and subtype classification (IDC vs ILC vs DCIS vs IBC). A five-level xAI audit trail provides complete traceability from clinical decision support output to specific biological structures visible on the original images. Conclusion: VSD establishes a unified, interpretable-by-design mathematical framework for explainable tissue composition analysis across imaging modalities and cancer types. Unlike black-box deep learning and post-hoc xAI approaches, VSD provides inherently interpretable, clinically verifiable cancer detection and immune phenotyping from standard clinical imaging at existing costs - without requiring foundation model infrastructure, specialized hardware, or molecular assays. The open-source pipeline (Google Colab, Supplementary Material) enables immediate reproducibility and extension to additional cancer types across the pan-cancer TCGA atlas.
rani, a.; mishra, s.
Show abstract
Accurate histopathological differentiation between High-Grade Serous Carcinoma (HGSC) and Low-Grade Serous Carcinoma (LGSC) remains a critical yet challenging aspect of ovarian cancer diagnosis due to their similar morphology and different clinical outcomes. This study presents a deep learning framework that uses custom attention mechanisms, including the Convolutional Block Attention Module (CBAM), Squeeze-and-Excitation (SE) blocks, and a Differential Attention module within five CNN architectures for automated binary classification of ovarian cancer subtypes from H&E WSI patches. Although individual models achieved higher accuracy, the ensemble stacking framework with a shallow MLP meta-learner delivered the best overall performance, with a ROC-AUC of 0.9211, an accuracy of 0.85, and F1-scores of 0.84 and 0.85 across both subtypes. These findings demonstrate that attention-guided feature recalibration combined with ensemble stacking provides robust and clinically interpretable discrimination of ovarian carcinoma subtypes.
Zvereva, A.; Kemp, H.; Gillespie, A.; Tomczyk, K.; Romualdo Cardoso, S.; Sevgi, S.; Mackie, K.; Fedele, V.; Alexander, J.; Goulding, I.; Gomm, J.; Jones, J. L.; Baxter, J. S.; Pettitt, S. J.; Lord, C. J.; Fletcher, O.; Haider, S.; Johnson, N.
Show abstract
Genome-wide association studies have led to the identification of more than 150 genomic regions that are associated with breast cancer risk. Translating these findings into a greater understanding of that risk requires identification of functional variants and target genes. Breast cancer progression and metastasis does not depend solely on cancer cell autonomous defects; the stroma, of which fibroblasts comprise a dominant component, also has a functional role. We generated promoter capture Hi-C data in primary and immortalized mammary fibroblasts and identified 28 interaction peaks involving 116 credible causal breast cancer variants and 26 target genes that were exclusive to fibroblasts. Integrating these data with H3K27ac CUT&Tag peaks identified a potentially functional variant (rs17393059) and target gene (filamin A interacting protein 1 like (FILIP1L)) at the 3q12.1 breast cancer risk locus. Using genome-wide functional data in breast-relevant cell types we demonstrate that perturbation of gene expression in mammary fibroblasts may impact risk of breast cancer by a cell non-autonomous mechanism.
Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.
Show abstract
Background: Sezary syndrome (SS) is an aggressive leukemic variant of cutaneous T-cell lymphoma (CTCL) with distinct clinical and biological features compared to rarer entities such as primary cutaneous CD8+ aggressive epidermotropic cytotoxic T-cell lymphoma (PCAECTCL). Although recurrent genomic alterations in CTCL have been described, comparative analyses at the pathway level across biologically divergent subtypes remain limited. Here, we leveraged a conversational artificial intelligence (AI) platform for precision oncology to enable rapid, integrative, and hypothesis-driven interrogation of publicly available genomic datasets. Methods: We conducted a secondary analysis of somatic mutation and clinical data from the Columbia University CTCL cohort accessed via cBioPortal. Cases were stratified into SS (n=26) and PCAECTCL (n=13). High-confidence coding variants were curated and mapped to biologically relevant signaling pathways and functional gene categories implicated in CTCL pathogenesis. Pathway-level mutation frequencies were compared using Chi-square or Fisher's exact tests, with effect sizes quantified as odds ratios. Tumor mutational burden (TMB) was compared using the Wilcoxon rank-sum test. Subtype-specific co-mutation patterns were evaluated using pairwise association analyses and visualized through oncoplots and network heatmaps. Conversational AI agents, AI-HOPE, were used to iteratively refine cohort definitions, prioritize pathway-level signals, and contextualize findings. Results: TMB was comparable between SS and PCAECTCL (p = 0.96), indicating no significant difference in global mutational load. In contrast, pathway-centric analyses revealed marked qualitative differences. SS demonstrated enrichment of alterations in epigenetic regulators, tumor suppressor and cell-cycle control pathways, NFAT signaling, and DNA damage response mechanisms, consistent with transcriptional dysregulation and immune modulation. PCAECTCL exhibited relatively higher frequencies of alterations involving epigenetic regulators and MAPK pathway signaling, suggesting distinct oncogenic dependencies. Co-mutation analysis revealed a more constrained and focused interaction landscape in SS, whereas PCAECTCL displayed broader and more heterogeneous co-mutation networks, indicative of divergent evolutionary trajectories. Notably, ERBB2 mutations were significantly enriched between subtypes (p = 0.031), highlighting a potential subtype-specific therapeutic vulnerability. Conclusions: This study demonstrates that SS is distinguished from PCAECTCL not by increased mutational burden but by distinct pathway-level architectures, particularly involving epigenetic regulation, immune signaling, and transcriptional control. These findings generate biologically grounded, testable hypotheses for subtype-specific therapeutic targeting and underscore the value of conversational AI as a scalable framework for accelerating discovery in translational cancer genomics.
Pacht, E.; Warren, J.; Toor, R.; Glass, K. C.; Greenyer, H.; Fritz, A.; Banerjee, B.; Frietze, S. C.; Lian, J.; Gordon, J.; Stein, G.; Stein, J.
Show abstract
Long noncoding RNAs (lncRNAs) are important regulators of gene expression and are frequently dysregulated in cancer. The mitotically associated lncRNA MANCR is highly expressed in aggressive cancers and contributes to genomic instability in triple-negative breast cancer (TNBC), but the molecular mechanisms underlying its activity remain poorly defined. Here we integrate computational and experimental approaches to examine the structure and regulatory interactions of MANCR isoforms. Analysis of transcriptomic datasets revealed tumor-type-specific expression patterns for seven MANCR isoforms in breast cancer cell lines. Computational prediction of RNA secondary structures identified conserved structural features across isoforms, suggesting potential functional specialization. We identify p53 as a MANCR-interacting protein through computational docking and RNA immunoprecipitation sequencing (RIP-seq) and demonstrate that MANCR depletion reduces p53-dependent transcriptional activity. Chromatin isolation by RNA purification sequencing (ChIRP-seq) revealed 1, 250 genomic regions associated with MANCR, including enrichment of p53 consensus motifs and GC-rich sequence elements. Motif analysis further identified candidate sequence features associated with MANCR-occupied chromatin regions. Computational prediction of RNA-miRNA interactions identified multiple potential miRNA binding sites across MANCR isoforms, including miR-6756-5p, which targets the androgen receptor (AR). Consistent with this prediction, AR expression decreased following MANCR knockdown in TNBC cells. Together, these results suggest that MANCR isoforms may contribute to transcriptional regulation in TNBC through interactions with chromatin, p53 signaling pathways, and potential miRNA regulatory networks. One Sentence SummaryMitotically-associated lncRNA (MANCR) is prevalent in aggressive cancers interacting with DNA, P53, and miRNAs, to mediate multiple levels of epigenetic transcriptional control in triple negative breast cancer.